- Review of t-tests, confidence intervals and prediction intervals
- Review of bootstrap and permutation
- Application to linear models
W6.C1
Counts on the log scale
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 0.1004 | 0.0482 | 2.086 | 0.0406 |
| M2008 | 0.9010 | 0.0491 | 18.350 | 0.0000 |
Model is \(log10(M2012+1)=\) 0.1004 \(+\) 0.901 \(log10(M2008+1)+\varepsilon\).
Write down the formula that was used to get the test statistic for the slope parameter.
\[\frac{b_1}{SE(b_1)}\]
where
\[SE(b_1)=\frac{MSE}{\sqrt{\sum_{i=1}^n(x_i - \bar{x})^2}}\]
and
\[ MSE=\sqrt{\frac{\sum_{i=1}^n (y_i - \hat{y}_i)^2}{(n - 2)}}\]
Check the numbers in the table.
\[H_o: \beta_1 = 0~~vs~~H_a: \beta_1 \neq 0\]
Decision: p-value is very small (twice the area to the right of red line), reject \(H_o\)
Conclusion: The slope parameter for the regression model using the entire population is not 0.
\[b_1 \pm t_{\alpha/2, n-2} SE(b_1)\]
For \(\alpha=0.05\), yielding 95% confidence level, \(n=\) 73, \(t_{\alpha/2, n-2}=\) 1.9939,
0.901 \(\pm\) 1.9939 \(\times\) 0.0491 = (0.8031 , 0.9989)
Explanation: We are 95% sure that the slope of a regression model fitted to the entire population is between 0.8 and 1.0.
For a given value of \(x\), \(\hat{y} \pm t_{\alpha/2, n-2}MSE\sqrt{\frac{1}{n}+\frac{n(x-\bar{X})^2}{n\sum_{i=1}^n(X_i-\bar{X})^2}}\)
For a given value of \(x\), \(\hat{y} \pm t_{\alpha/2, n-2}MSE\sqrt{1+\frac{1}{n}+\frac{n(x-\bar{X})^2}{n\sum_{i=1}^n(X_i-\bar{X})^2}}\)
MSE from model fit is 0.1838.
For regression, to test \(H_o\), one column of the two is permuted to break association.
df <- data.frame(x=letters[1:5], y=letters[1:5]) head(data.frame(df, py=sample(df$y))) #> x y py #> 1 a a b #> 2 b b d #> 3 c c c #> 4 d d e #> 5 e e a
Make many more permutation sets.
p1 <- ggplot(oly, aes(x=M2008, y=M2012)) + geom_point() + ggtitle("Original data")
p2 <- ggplot(oly, aes(x=M2008, y=sample(M2012))) + geom_point() + ggtitle("Permuted response")
grid.arrange(p1, p2, ncol=2)
Red lines indicate values from our data, which are far from the values obtained from the permuted data.
orig <- letters[1:10] orig #> [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" boot1 <- sort(sample(orig, replace=TRUE)) boot1 #> [1] "a" "b" "f" "f" "g" "h" "h" "i" "i" "i"
#> label l u #> 1 classical 0.8031 0.9989 #> 2 bootstrap 0.8019 0.9869
Procedure derives from bootstrapping residuals.
#> Observations: 67 #> Variables: 17 #> $ County <chr> "ALACHUA", "BAKER", "BAY", "BRADFORD", "BRE... #> $ Palm_Beach <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0... #> $ Population <int> 198326, 20761, 146223, 24646, 460977, 14707... #> $ Log_Population <dbl> 12.198, 9.941, 11.893, 10.112, 13.041, 14.2... #> $ Black <dbl> 21.8, 16.8, 12.4, 22.9, 9.2, 17.5, 16.9, 4.... #> $ Hispanic <dbl> 4.7, 1.5, 2.4, 2.6, 4.1, 10.9, 1.6, 3.4, 2.... #> $ age 65 (%) <dbl> 9.428, 7.697, 11.882, 11.819, 16.462, 20.32... #> $ college <dbl> 34.6, 5.7, 15.7, 8.1, 20.4, 18.8, 8.2, 13.4... #> $ Income (Thousands) <dbl> 26.60, 27.61, 26.85, 25.28, 33.28, 31.26, 2... #> $ Income (Dollars) <int> 26597, 27614, 26846, 25277, 33284, 31264, 2... #> $ Age 65 (total) <int> 18698, 1598, 17374, 2913, 75888, 298900, 17... #> $ Gore <int> 47365, 2392, 18850, 3075, 97318, 386561, 21... #> $ Bush <int> 34124, 5610, 38637, 5414, 115185, 177323, 2... #> $ Buchanan <int> 262, 73, 248, 65, 570, 789, 90, 182, 270, 1... #> $ Nader <int> 3215, 53, 828, 84, 4470, 7099, 39, 1462, 13... #> $ Total Votes <int> 84966, 8128, 58563, 8638, 217543, 571772, 5... #> $ Log_Buchanan <dbl> -1.1765, -0.1074, -0.8593, -0.2844, -1.3393...
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 2.1465 | 0.3955 | 5.428 | 0.0000 |
age 65 (%) |
-0.0415 | 0.0070 | -5.939 | 0.0000 |
| Black | -0.0132 | 0.0046 | -2.884 | 0.0054 |
| Hispanic | -0.0350 | 0.0051 | -6.807 | 0.0000 |
| college | -0.0193 | 0.0097 | -1.991 | 0.0510 |
Income (Thousands) |
-0.0658 | 0.0144 | -4.582 | 0.0000 |
pb <- florida %>% filter(County=="PALM BEACH") pb_p <- predict(florida_lm, pb) pb_e <- pb$Log_Buchanan - pb_p kable(cbind(pb$Log_Buchanan, pb_p, pb_e))
| pb_p | pb_e | |
|---|---|---|
| -0.2365 | -2.003 | 1.766 |
This work is licensed under the Creative Commons Attribution-Noncommercial 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/ 3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.